Reasearchers in a wide range of fields have studied transmission processes in dynamic networks for several decades. Key contributions have been published addressing problems in transit logistics (Cooke and Halsey, 1966), network routing, livestock disease transmission (Dube, et al, 2008, Bajardi, et. al. 2011), diffusion of innovations, percolation networks, “viral” social media, and epidemiology.
Characterising static networks with useful descriptive statistics is already quite challenging, and the added temporal dimension greatly increases the complexity of the problem. Relatively few generalizeable descriptive statistics for dynamic networks have been published. Those that exist frequently operate by dividing the longitudinal network up into a series time bins, aggregating together ties within those bins into static networks, and calculating a network statistic on each bin. This is often a useful approach, and one that can return graph statistics as a time series (cite papers, tsna), but number of issues arise. Determining a bin duration appropriate for an analysis can be difficult (Sulo, et. al. 2010). Longer duration bins may be required for the graph to be sufficently connected to measure higher order network properties, yet as more time is aggregated the loss if the tie ordering information begins to inflate the assesed connectivity. Although the density-inflating effects of short-duration aggregations may not be as severe as the distortions caused by collapsing over the entire observation period, it still will collapse tie sequence effects and add a bias to the network measures.
The goal of this paper is to outline and demonstrate several algorithmic approaches to characterising the transmission potential in longitudinal networks, primarily by computing sets of possible temporally permissable traversals of the network. There is a growing literature which uses simulated epidemics across longitudinal networks (Dube, et. al. 2008). Epidemic models require specifying their own set of parameters and the specific details of individual transmission processes may vary widly across phenomena, perhaps depending on quantities that are difficult or unethical (in some epidemiology cases) to measure experimentaly. Our hypothesis is that computing properties of the of “substrate” of dynamic network ties which facilitate transmission will allow inferences about bounds on the size and speed of potential epidemic spread.
We are considering here a class networks where the existance of edges are presumed to constrain possible transmissions between vertices. The assumption is that the network has either been fully observed (simulated, pre-scheduled) before the transmission process takes place, or the transmission process is independent of the edge dynamics (unable to influence formation or dissolution of ties). So we might include the network fromed by the routing of cargo through an previously scheduled transit network, but not a network of phone calls where people may decide whom to call based on the information they recieve in a previous call. Not citation networks either?
For the purposes of this paper we consider a dynamic network to be a collection of vertices and incident undirected edges (or directed arcs) in which each element has an associated set of activity spells that determine when the elements are availible for transmission. An activity spell has onset and terminus time values defining a right-open interval. Each element may have multiple spells associated with it (meaning that an edge may toggle on and off multiple times) but the spells must be non-overlaping. A dynamic network also has a (sometimes implicit) observation window defining the interval of time (generally inclusive of all the finite edge and vertex spells) over which the network was known to be observed (or simulated).
This definition of dynamic networks is inclusive of (but not restricted to) the common representation of a dynamic network as a sequence of static networks or stack of time-indexed matricies corresponding to sucessive discrete time units or survey waves. For example, the following sequence of adjacency matrices describes a three-vertex directed network observed at three time points.
A B C
A 0 1 0
B 1 0 1
C 0 0 0
A B C
A 0 0 0
B 0 0 1
C 1 0 0
A B C
A 0 1 0
B 0 0 1
C 0 1 0
The same information represented as the activity spell matrices for each edge. Each row is a spell, the first column is the onset, and the second is the terminus.
edgeSpells
$`edge B-A`
[,1] [,2]
[1,] 0 1
$`edge A-B`
[,1] [,2]
[1,] 0 1
[2,] 2 3
$`edge B-C`
[,1] [,2]
[1,] 0 3
$`edge C-A`
[,1] [,2]
[1,] 1 2
$`edge C-B`
[,1] [,2]
[1,] 2 3
This distinction in representation is important because the “edge spell” representation permits independently assigning real-valued activity times to each edge and allows us to explicitly consider the order and sequence of edge activations when searching for potential paths.
A forward temporal path is a sequence of vertices and edges in a dynamic network such that the start times of successive elements are greater than or equal than those of the previous. The path is a traversal of the network (occuring within its observation window) that respects the constraints of edge activity spells and permits “waiting” at intermediate vertices for edges to form in the future. Due to the required ordering of events, a temporal path is a directed traversal, even if the underlying network itself is undirected. Bui Xuan, et al (2002) call a permissable time-respecting path between a pair of vertices (a source and destination) a journey.
To give a concrete example, if a network describes the time-evolution of contacts between people during which disease transmission could occur, a sequence of infection events spreading out through network from a single person would be a temporal path. Likewise, a series of time-stamped messages forwarding a cute cat picture would be a temporal path in a dynamic email network. For simplicity we assume that once a path (message) reaches a vertex, that vertex remains “infected” and can transmit to all of its future contacts.
Depending on network topology, there may be zero or arbitrairly many possible temporal paths between a single pair of vertices. Temporal paths are often not symmetric: if vertex i can reach vertex j, it does not follow that j can necessairly reach i. This means that a dynamic network can be connected in the static sense (edges exist linking all of the vertices in a single component) but not fully reachable via temporal paths because an edge needed to complete the path becomes inactive before the adjacent edge activates.
Determining if a given path between two vertices is a permissable temporal path or journey requires specifying a starting time point from which the path should be evaluated. This is often, but not always, the beginning of the network observation window. A journey that is possible at at a given time point may not be feasible if it was ininitiated later on. A complete specification of allowable paths also requires being explicit about the duration of time required for a path to cross an edge – further details below.
Many authors have provided basic examples illustrating the differences betweeen vertices reachable accoring to a static or time-aggregated view of a network in contrast to an allowable temporal path, which we won’t reproduce here (CITE). But it is worth working through several path examples in a trivial network. In the network diagram below, the labels on the edges indicate their activity spells.
For a network this simple it is possible to describe all of the possible temporal paths paths from vertex A.
Of course, vertex B is reachable from A from time 0 until time 2. Vertex C is reachable from A from time 3 until time 4 – allowing for waiting at B for one time unit. Vertex D is never reachable on a temporal path from A, because the C–D edge dissolves before the B–C edge forms. Vertex E is reachable from A from time 1 until 3, and again from time 5 until 7. If we did not allow waiting on vertices, E could only be reached from A from time 1 until 2, and C would not be reachable. Also, if we started our path search from A after time 2 then none of the other vertices are reachable from A.
## Loading required package: ape
##
## Attaching package: 'ape'
## The following object is masked from 'package:sna':
##
## consensus
## Loading required package: EpiModel
## Loading required package: deSolve
##
## Attaching package: 'deSolve'
## The following object is masked from 'package:graphics':
##
## matplot
## Loading required package: tergm
## Loading required package: statnet.common
## Loading required package: ergm
##
## ergm: version 3.5.1, created on 2015-10-18
## Copyright (c) 2015, Mark S. Handcock, University of California -- Los Angeles
## David R. Hunter, Penn State University
## Carter T. Butts, University of California -- Irvine
## Steven M. Goodreau, University of Washington
## Pavel N. Krivitsky, University of Wollongong
## Martina Morris, University of Washington
## with contributions from
## Li Wang
## Kirk Li, University of Washington
## Skye Bender-deMoll, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm").
## NOTE: If you use custom ERGM terms based on 'ergm.userterms'
## version prior to 3.1, you will need to perform a one-time update
## of the package boilerplate files (the files that you did not write
## or modify) from 'ergm.userterms' 3.1 or later. See
## help('eut-upgrade') for instructions.
##
## tergm: version 3.4-14107.1-2015.10.18-17.37.57, created on 2015-10-18
## Copyright (c) 2015, Pavel N. Krivitsky, University of Wollongong
## Mark S. Handcock, University of California -- Los Angeles
## with contributions from
## David R. Hunter, Penn State University
## Steven M. Goodreau, University of Washington
## Martina Morris, University of Washington
## Nicole Bohme Carnegie, New York University
## Carter T. Butts, University of California -- Irvine
## Ayn Leslie-Cook, University of Washington
## Skye Bender-deMoll
## Li Wang
## Kirk Li, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("tergm").
A common alternate conceptualization when considering temporal paths in a discrete time temporal network is to use a time projected network (cite moody and earlier) representation. We imagine this as a static network constructed as an aggregate of the ordered sequence of the networks corresponding to each wave, with each vertex connected to its realization in the subsequent time network by a directed identity arc. A temporal path in a time projected network can reach forward in time only along the identity arcs and spread across the network via the regular network ties. The image below shows a projection of the seven timesteps of the trivial network and highlights in red some possible forward temporal paths originating from vertex A at the first timestep. NOTE: this plot would be static image if not on web.
## glX
## 1
A disadvantage of this representation is that if this network was describing a continous time process, the journey could also have crossed at time 1.5, or any intermediate time while the edge was active. This possibility appears to be prohibited in this model. For this reason, the time projection does not generalize easly to continous-time networks where observations have not been made in discrete waves and edges may have varying durations and onset times permitting an infinate number of possible traversal times.
This image also illustrates possibility of multiple forward temporal paths existing between pairs of vertices. In the image the link from A to B at time 0 is hilighted, and the second link at time 1 is not. Both of these are allowable journyes because we permit the edges of dynamic networks to have more than one activiation spell and to be active for multiple units of time. But even if we required that an edge between two vertices could only be active for a single discrete time step we still must allow for the possibility of jouneys involving different sets of vertices arriving at different times. In order to effectively specify a definitions of ‘shortest’ temporal paths, we will need to introduce more specific terminology for restrictions on path types.
The appropriate metric for measuring path length can vary widely across problems. For example, if we are representing transportation logistics schedules as a temporal network it is possible to use the global overview of the network to plan optimal journeys across the network. But we still need to define the quantity that the planner (or algorithm) is attempting to optimise (or minimize)? Are we looking for the earliest possible time we could arrive at a destination? The shortest possible time in transit? The latest possible departure time that still arrives within our bounds? The fewest number “hops” along the route (i.e. least number of plane changes)? The least duration of waiting at intermediate vertices?
To illustrate some of the possible path types, we can define a dynamic network consisting of the following edge spells to contrast five path basic types described by Moody (CITE).
The earliest leaving forward path would be journey that leaves the source vertex earliest. The earliest arriving forward path is the journey that first reaches the target vertex. The quickest forward path would be the journey with the least total duration. The latest leaving forward path remains on the source vertex as long as possible before departing. The latest arriving forward path We must include the “forward” qualifier, as each of these paths would also have a “backwards” counterpart. For each path, we assume that same criteria is applied when selecting the route and departure time from each intermediate vertex.
For the examples above, all of the paths are two geodesic steps in length, and we are only varying the activity timing of the edges. Real networks of sufficent scale will have paths varying in both temporal and geodesic dimensions so it is quite possible to have instances where there are shorter (fewer steps) paths that arrive later. It is important not to conflate the earliest path with the shortest path, as the units of measurement are different. Even the earliest shortest path is a distict concept probably requring a different algorithm to calculate. The example below contrasts multiple-step paths from A to C to illustrate this.
A-B-F-C (blue) is the earliest path, arriving at t=3 after three graph hops and three time steps. A-G-C (blue) is the shortest path (three hops, two timesteps), but that route can’t arrive at C until t=6. A-D-E-C is the quickest, traverseable in a single time step at t=4, but requring three hops. It is also the latest departing path and the latest arriving path, the only path that doesn’t require waiting at any vertices and has the widest departure window from A. This example illustrates the differences between gdist (number of graph hops, geodesic distance) and tdist (temporal distance, time elapsed from path start).
When dynamic networks applied for epidemic propagation problems we can generally make an assumption that the element being propagated is passive with respect to its transmission events - it is not able to plan its route. In these cases we are likely interested in the most probable paths of random walks. We might assume that transmission probability will be related to the total number and length of the possible journies. For the discrete case, an exhaustive enumeration of the paths may be possible tho computationally expensive. The continous case would seem to require some sort of analytic technique to sum.
The earliest forward temporal path from vertex i to vertex j can be thought of as the soonest possible time a message sent from i to j could possibly arrive, with the assumption that it also makes the earliest possible traversal to each of its intermediate verticees. Bui Xuan, et. al. (2002) refer to this concept as the foremost path. If an earliest arriving path exists, we know there must be at least one valid forward path, so we can use it to compute temporal reachability. We will see below, we can calculate the earliest path with relative efficiency using algorithms similar to those used to calculate the shortest geodesic path (Bui Xuan, et al. 2002).
The key feature is that much like “shortest geodesic”, “earliest arrival” is quantity that can be minimised with respect the the starting time point of the path. In other words, if we are searching through the network in temporal order and have found an earliest path from i to j at t = t0, no earlier path between them can be created at a later time t > t0. It is crucial to remember that it is often possible to discover shorter paths (fewer graph hops) between i and j at later times, it is just that such a path by definition would still occur later.
Although we have not derrived a formal proof here, it seems that this would follow the same logic as the proof of Dijkstra’s algorithm, substituting the difference between the onset times of sucessive edges for graph hops as the distance metric, and always choosing the current soonest vertex to check. Bui Xuan, et al. (2002) do provide a proof, although they use a slightly different algorithm.
The earliest arriving forward temporal path is convinient in that it appears to be well-defined for large class of network representations. It can be computed for networks using both discrete and continuous time models, for edges defined by momentary activation or explicit intervals of duration, and for networks that have either single or multiple edge spell activations betwen a vertex dyads.
Reversing the signs and the direction of optimization (starting and the end of the network and working backwards) it is possible to use a closely related algorithm to derive the latest departing backwards paths. This gives us the set of vertices v could be reached by and the latest times the journies could depart those vertices. Assume we don’t have an efficient way to find the latest arriving path, since it is a longest path (aka Traveling Salesman" problem. (CITE)
The networkDynamic package for R (Butts, et. al. 2012) provides an implementation of the dynamic network data model described above, in which each edge has an associated set of spells defining its activity. This makes it possible to construct an adaptation of the well known “Dijkstra” shortest path algortihm for weighted networks to compute the earliest forward paths. Essentially the edge weights, and hence the distance to be minimized, are replaced with the amount of time elapsed from the starting point of the path (temporal distance), rather than the number of hops in the graph traversal (path distance).
The for discrete networks, the algorithm can be thought of as computing a shortest path in the ‘time projected’ network (Moody 2008)–except that we are not limited to discrete time and the duration of the ‘self-edges’ need not be uniform.
# assume 'v' is vertex id of source vertex
# assume that the interval 'start' 'end' defines the observation window to be searched
# define vector of distances to all possible targets, default to un-reachable (Inf)
# define vector used for reconstructing the path, gives the previous (parent) of each vertex
# set temporal distance of 'v' to 0
# define a vector of unchecked vertices
# continue looping while any vertices remain unchecked
# chose a vertex 'u' to check, having the smallest temporal distance found so far
# remove the vertex 'u' from the list of unchecked vertices
# if the temporal distance associated with 'u' would place it outside the observation window bounds
# then no more vertices are reachable from 'v' within time range, so end the search
# check neighbors of 'u' by getting a vector e of connected edges
# for each edge in 'e' in the vector of 'u's edges
# choose 'w' as id of u's neighbor on the edge
# find the index of the earliest active spell of the edge 'e' intersecting with 'u's currently computed distances and 'end'
# (if we are using graph.step.time > 0, may need to search for a later spells here)
# if no active spells are found
# vertex 'w' is never reachable in the future from 'u' within time bound, so mark its distances as Inf
# if an active spell is found
# define distance u_w as the later of 0 or the difference between the
# currently computed 'u' time and the onset of the spell for edge 'e'
# (but if we are counting graph steps as part of the distance, distance can't be less than graph step)
# define dist_v_w as currenctly computed distance for 'u' plus distance u_w
# if distance_v_w is less than the previously computed distances for 'w'
# update the distance for 'w' to the new distance
# update the record for 'w's previous parent to 'u'
# return the vector of distances found and parents of each vertex
# un-reachable vertices will be marked with Inf
This algorithm, implemented in the tPath function of the tsna package (Bender-deMoll and Morris, 2015) is quite fast for temporally sparse networks because it is able to directly access the activiation spells for each edge, and it does not need to ‘visit’ any of the intermediate time points as an epidemic sim would. It can be modified to account for transmssion delays (graph.step.time) , and with early termination criteria if the goal is only to find the earliest pair-wise path from i to j (instead of the paths from i to all reachable vertices).
Note that in the case that there are two earliest arriving forward paths with exactly the same temporal distances, this algorithm will pick one of the two aribitrairly depending on the specific data structures and details of the implementation. The information on alternate paths is not recorded as it could be in a Breadth First Search algorithm.
Because of the posibility of a shorter (fewer hop) path occuring at a later time, this algorithm cannot be used to find the more general case of a shortest (geodesic) time respecting paths, although such paths can be found by related methods (Bui Xuan, et al. 2002).
This algorithm does not function in the presense of any negatively valued times, so if the starting point for the observation window is earlier than zero, all of the spell values must be shifted by a constant offset to be greater than zero before starting the computation
In the situation where all of the edges have identical activation spells (i.e. edge dynamics could be ignored and represented as static) the algorithm will reduce to a standard shortest path problem only if a temporal distance between sucessive edges is defined. An effective work around is to define a temporal cost to each edge transmission using a graph.step.time of 1.
A forward reachable set for v at t is the complete set of vertices that can be reached along forward temporal paths from a vertex v starting at time t. There must exist an earliest arriving path if a forward path exists (Bui Xuan, et al. 2002), so we can efficiently compute the forward reachable set using the earliest arriving path.
A forward reachable set can have an associated tPath, which is a record of a sequence in which vertices in a forward (or backwards) reachable set are reached along (usually multiple) journeys from a single source vertex, along with the timing of of each graph hop. A tPath defines a temporally permissiable spanning tree for the network, given a seed vertex and start time. A tPath could be considered as an extracted portion of the original temporal network such that there is only one possible path (gaurnteed by the tree structure) from the source to each of the (reachable) target vertices in the network. Since there is only one path and possible transmission time in a tPath, all of the temporal path types will be equivilent.
To illustrate the concept of a tPath we will look at some examples of possible paths in a small simulated discrete time network named toy_epi_sim (Bender-deMoll 2013). This 100-vertex network was constructed using a tergm simulation (Kirvitsky and Handcock 2015) parameterized so that it created a dynamic network with an average of 50 edges active at each timestep, and each edge averaging 10 timesteps in duration. We can get an idea of the momentary connectivity and spread of the forward paths in this network by looking at a series of snapshots of the network with the vertices that the path from vertex 11 has reached hilighted.
The small multiple view allows us to see that the path began to expand rapidly when it was able to reach the large component of the network – somewhere around time 18. If we have access to an animated view, this even clearer.
But this representation alone doesn’t make the sequence of infections evident at a glance. To get a clearer conceptual sense of the growth of the path, we can visualize the tPath as a tree reaching forward in time, with each vertex branching to link to the vertices it contacts.
This view mostly discards the timing information – we can see sequence of infections, but we can’t determine when the vertices were reached – so we are unable to pick out the time when the path encountred the component.
A hybrid “transmission timeline” view plots a linked tree of the parent-child relationships, but positions each vertex acording to the number of graph hops (gdist, generations) on the y-axis, and temporal distance from start on the the x-axis This places the seed node (vertex 11) near the origin in the lower left, with subsequent vertices appearing rightwards and above. We can see that around time 18, a dense diagonal shooting up as the path encountered the component. And we can see that the bridge was created by the formation of the tie between vertices 58 and 22.
This view has its own drawbacks. Depending on the network structure, it is possible for many of the vertices and edges in the tree to overlap, and infections that occur after long delays are hard to follow visually.
It also important to remember that the forward path being visualized is only one possible type, and only explored from a single vertex. Paths starting from other vertices (v69 for example) may only reach one or two others, while paths starting within a connected component may achieve a spreading burst more rapidly (v77).
Due the the variablity accross vertices, if we wanted to characterize an entire network we would need to compute the collection of tPaths from all vertices – or at least a significant sample of them.
Interestingly, data for constructing tPaths for real world transmission processes may already be availible in public health contexts. When working to contain serious infectious diseases, contact tracing interviews are often employed to track back from the set of known infected individuals to the possible infection source to locate additional exposed persons and determine some of the disease spreading properties (i.e. “Basic reproduction number”) even when the overall population contact network cannot be reconstructed.
For example, we can consider epidemic data derived from a measles outbreak in the town of Hagelloch, Germany in 1861 (Groendyke and Welch 2015, Neal and Roberts (2004)). 188 individuals were infected over the course of the epidemic. The underlying contact network is not observed, although it could be somewhat infered via household and classroom membership and proximity. Since this is a real epidemic, we must assume that there are delays between individual’s exposure and their becoming contagious or developing symptoms We can clearly see bursts in the epidemic, but it may be difficult to determine if they are caused by the infection finding paths to cross between densely connected communities or by the incubation period.
The burstiness would of course also be clearly visable if we were to just plot the cumulative number of cases. Or, to put in in a more generalizeable form, the cumulative ‘reach’ of the forward path expressed as a fraction of the total number of vertices (although in this case we don’t know the number of un-infected individuals in the community).
In this case we don’t know if the epidemic halted because it reached the boundries of the community or because preventive interventions (or everyone being to sick to walk) began to modify the contract structure of the community. Either way, we see the rate of new infections taper off at the end. This raises issues for calulating rates, as we will see below.
Although there are more nuanced methods for arriving a statistically valid estimate of a reproduction number, a crude approach is to simply compute the mean number of children for each vertex in the tPath.
TODO: model as a branching process? TODO: reproduction number as the mean degree in the tPath
The shortest path geodesic distance measure serves as the basis of a large number of network measures (cite Brandes). It seems useful to generalize some of the measures to temporal networks using the earliest forward path. (cite moody)
Several authors have proposed a ‘reachability graph’ projections of dynamic networks. This graph would be a representation of a dynamic network having the same vertices, but the original edges are replaced by directed arcs between vertices whenever it is possible to make a journey from one to the other. The earliest forward path provides an relatively efficient way to construct it.
Easy to compute the earliest forward set from any vertex. Compute a directed reachability matrix for any starting time. partition using directed component measures.
If the network has been observed long enough to be fully reachable, we will need to apply a threshold to the temporal distances reachability matrix in order to effectively break up the network
Compare the distribution of path lengths from aggregate, a single cross section, and earliest fwd paths
hist(as.vector(geodist(as.matrix(toy_epi_sim))$gdist),
main='histogram of path lengths in aggregate toy_epi_sim net',
xlab='geodesic path length',xlim=c(0,25))
hist(as.vector(geodist(as.matrix(network.extract(toy_epi_sim,at=10)))$gdist),
main='histogram of path lengths in momentary toy_epi_sim net at t=10',
xlab='geodesic path length',xlim=c(0,25))
hist(as.vector(gdistmat),
main='histogram of fwd path lengths in toy_epi_sim net',
xlab='earliest fwd path length',xlim=c(0,25))
We can calculate an temporal extension of a betweeness measure using the earliest forward paths rather than the geodesics on static graphs (Habiba, et. al. 2007) (Cite Moody) Cite Brandes (2008). This computes the number of shortest earliest forward paths that each vertex lies on. The standard definition of betweeness would then divide this by the number of shortest paths between each vertex pair in order to weight the shortest paths by the inverse of their redundency (Cite Butts sna). However, the method we are using to calculate the earleist paths doesn’t collect information on potential alternate paths of the same length, so we are currently limited to calculating the ‘stress centrality’ form of betweeness.
ASIDE: Since it is possible to define a betweeness score for vertices, it seems likely that it could be similarly defined for edges, which would lead to the possibility of temporal edge-betweenness clustering where the ‘transmiting’ spells of the highet betweeness edges are sequentially deleted and the scores are recalulated until the graph is partitioned. ’tho I’m not sure there is an immediate extension to a temporal modularity score.
CONTRAST SOONNESS centrality, tReach, and tiedDuration with simulated epidemic sizes
Defining the temporal bounds of the observed network evolution raise some significant issues. For many classes of networks – certainly most stochastic network models – if the simulation is run long enough the network will become fully temporally reachable. Eventually enough ties will have formed and disolved so that there is a temporally permissable path from every vertex to every other. So how can we determine the correct time window for measuring a network process? Somewhat similar to the “network boundry problem”" in determining the appropriate vertex set for a network. For real-world data, the temporal boundries are usually somewhat arbitrary, defined by the limits of an archive or resource constraints of data collection.
The assumption here is that we care about comparing transmission rates in networks over short enough time scales, or we would like to know how long network process would run in order to ensure reachability. Does the longterm evolution of the network imply that all the vertices are eventually mutually reachable? Or are there structual barriers (weakly connected components in ghe graph) that act as barriers to mixing? Is the observation window long enough so that we can distinquish between seprate
if a tie toggles on and off a lot very quickly, from the perspective of a transmission process its temporal connectivity approaches that of a ‘weak’ tie that is always active.
network growth process, impact of vertex vital dynamics
It seems that there should be a measure that allows us to compare ‘how good at spreading’ different networks are. Standard measure of network density is impacted by network size. Larger networks have lower density because the number of potential dyads is greater, even if the amount of connectivity expeireced by individual vertices are similar. A ‘size invarient’ representation of density to express the number of ties from the perspective of individual vertices and make comparsions of different sized networks using mean degree of vertices instead of density. This suggests similar transformation for rates, expressing rate of change as an expected mean time until an edge toggle. Possible to calculate this from sampled egocentric data even when the network for a full population cannot be collected?
Ideally the values for the metrics shouldn’t depend on how long we observe the network. Having a longer observation period should reduce the error in measurement, not change the values of coefficients we get back.
Crude measure is simply churn, how many edges are formed and broken. But a network could have a very high churn rate by forming and breaking the same edges without achieving any new connectivity. Seems like for this what we want to find is the rate at which paths are able to discover new vertices.
How quickly is the network mixing? many possible ways to quantify the mixing: * How long does it take to reach X fraction of the network? Problem. The network may not be reachable within the observed time period. * Fracition of sampled t paths that reach the maximum size within a given time period * Mean rate at which ‘new’ vertices are reached by forward paths.
As long as we are thinking about these as processes where infect.prob=1.0, we can efficiently calculate them with earliest forward path.
“mean size of infected component at t=100” or “mean time to reach 50% of vertices” are hard to compare across networks if time units don’t match. Or, even if units are comparable, the observation period for one network may be too short.
impacts of tie duration? vs length of observation window. li’s work http://students.washington.edu/lxwang/summary%20plots.2.html
Assuming that the network is not a growth process (with new verticies being added over time) and is fairly homogenous in rates the size of a network works to bound the maximal growth of the forward components following the logistic curves that describe epidemic growth. As the number of vertices reached grows over time, the “front” of vertices able contact unreached vertices increases, but after a significant fraction of the population has be contacted, it becomes increasingly unlikely that the few remaining vertices will happen to be found, and the spread slows down.
Several interesting features stand out in these traces. There is wide variation (and often interesting parallelism) in the trajactories from individual vertices. A few paths may start from isolated vertices and never reach anyone else. For this example network, which contains a largish component for much of its evolution, there is often a big “stairstep” when the path manages to reach that component and is able to quickly jump across it (we also saw this in the Hagelloch plot). None of the traces is able to reach 100% of the network, because the network contains four isolates. Although a few paths have likely not reached the full forward reachable set by the end of the observation window, all of the paths show a decrease in rate as they approach saturation at the maximum possible size.
The saturation effect means that if we want a discriptive statistic for the transmission potential of a network over time, we may not want to start all of the seeds for the forward paths at the beginning of the observation period. To characterise the overall observed network accurately we likely need to choose random starting times in addition to random seeds and calculate the rate based on the amount of time each path was given to spread.
Alternate possiblity: “wrap-around” infections – when the end of the time window is reached, start searching again from time 0, from all of the vertices reached so far. Problem then is stopping criteria.
The mean reachable rate for a network, we select a number of seed vertices at random, and for each seed select a random starting time within the network’s observation period. For each seed, compute the size of the forward reachable set (starting from the associated time) and divide by the duration of time remaining for the seed to spread. Closely related to the “reproduction number” or “R naught” in epidemiology or population ecology.
Comparing values for toy_epi_sim computed starting all paths from zero with starting at random times. The random start gives larger values because the “slow tail” part of the path is trimmed off for more networks. This value is not necessairly “more accurate” (it may be an over-estiamte) but it should be less suceptible to window effects of observation duration.
rRate(toy_epi_sim,sample.size=100,start=0,end=26,random.times = TRUE)
## [1] 4.910926
rRate(toy_epi_sim,sample.size=100,start=0,end=26,random.times = FALSE)
## [1] 3.351538
Aside: Interesting question about sample validity. Seeds are chosen at random but unless the population size is very large, it is likely that the forward reachable components found from each seed are not independent. If there is a region of the graph that is strongly temporally connected, it is likely that many of the seeds will discover nearly the same set of reachable vertices. Does this mean that large FRS are likely to over-represented in samples on graphs that have them?
in the amount of time it takes on average for the network to toggle one edge on or off, how many new vertices can be reached on average?
The mean reachable rate provides an index of the relative rate of potential spread vs the time units of the network itself. So it may be suitable for comparing networks using the same time-units. However, it is not a unit-free “rate invariant measure”. It cannot distinguish between when diferences in the values are caused by topological differences in network connectivity or simply that one network evolves at a faster rate (or is measured in larger time units) than another. The mean reach per toggle divides the mean reachable rate compute the average amount of time (per capita) until a the next edge toggle in the network. Mean reach per toggle is a measure of how often new ties form with new vertices vs already reached vertices. A value of 0 would mean that edges only toggle on and off between the same vertex pairs, but no new mixing occurs. A value of 1 would mean that every toggle reaches a new vertex, and values > 1 would mean that every toggle connects to a set of previously unreachable vertices.
The table below provides some very basic statistics for comparison among several simulated and observed networks.
| net.names | reachable.rate | net.size | net.dur | mean.edge.dur | time.to.toggle | reach.per.toggle |
|---|---|---|---|---|---|---|
| base.sim | 4.8125389 | 1000 | 100 | 20.159115 | 3.245699e+01 | 0.1482743 |
| middle.sim | 0.6289240 | 1000 | 100 | 19.411888 | 3.221649e+01 | 0.0195218 |
| monog.sim | 0.3725135 | 1000 | 100 | 19.974278 | 3.267974e+01 | 0.0113989 |
| short.stergm.sim | 1.0065759 | 16 | 25 | 10.531250 | 4.255319e+00 | 0.2365453 |
| toy.epi.sim | 5.0517274 | 100 | 25 | 6.861538 | 8.361204e+00 | 0.6041866 |
| hospital.rfid.contact | 0.0002430 | 75 | 347520 | 569.341528 | 9.284035e+02 | 0.0000003 |
| manufacturing.co.emails | 0.0000161 | 167 | 23426882 | 14.283022 | 2.367841e+04 | 0.0000000 |
| enron.emails | 0.0000021 | 184 | 141075619 | 12.201920 | 2.744254e+05 | 0.0000000 |
Example plots of momentary networks and Cumulative time to reach plots
real networks often have periodicity, at multiple time scales
Enron, Polish manufacturing continusous time
BART (a network designed to transmit efficiently)
Epidemilogical infection models of real disease are necissairly complicated in order to account for lots of crucial details of disease process, changes in viralance over the duration of infection, mortality, recovery etc. Exact values for many crucial parameters (HIV infectivity) are not known and are difficult to research ethically. How far can we get just be understanding the maximal infection sizes permitted by the underlying contact network?
Earliest fwd path provides hard upper bound on infection reach, transmission tree is not necessiairly the substrate that SI model spreads across. Many networks will include alternate paths (including shortest and most likely) which are not the earliest forward path. How often is this the case for networks that we care about (‘realistic’ disease transmission networks) (are the paths in these networks constrained enough that earliest path gives an easy to calculate huristic)?
Earliest fwd path algortihm can be considered an SI process with infect. prob = 1.0, it not obviously well suited to modeling other infection rates as its efficiency is partially gained by its ability to avoid ‘visiting’ any timesteps intermediate to changes in the network tie structure – these steps would generally be needed to evaluate probabalistic computations of infection status.
One could assume that for many networks, a lower tdist would correspond to an increased likelyhood of a vertex being reached in an infection process. Or more precisely, a large tdist means that we know a vertex is not reachable until late in the network’s evolution, so even if the infection does not travel on the earliest fwd path, there is much less time remaining for vertex to be reached by alternate routes.
measuring the size of an epidemic, generation time, fraction reached at t, vs time to reach a fraction
For discrete time network models, we can use discrete time epidemiologial transmission simulations as an alternate means to compute the forward reachable set. For this paper, we adapted Samuel Jenness (2015) modules for simulating epidemics on observed networks using the EpiModel package (Jenness, et al 2015) to compute SI infection trees as tPaths while permitting us to vary the infection propability rates and run batches of simulations from multiple seeds in a single network realization.
When comparing the earliest path results with a discrete time infection model, it is important to be explicit about the assumptions of the duration required for a transmission to occur (or be observed). By default, the path calculations in tsna and in analytical approaches to computing reachable sets usually assume transmission can occur instantly across edges that are active at the same time. A discrete time epidemic model usually requires that propagation can only occur one edge per time unit – initial infection and re-transmission cannot occur in the same timestep. Habiba, et. al. 2007 also reference this concept, making the distinction between a time respecting path (all time labels on edges are non-decresing) and strictly time respecting path (all time labels are increasing).
This distinction is crucial when a forward reachable path encounters an existing component or densly connected region in the network. If the graph.step.time parameter is assumed to be zero, the entire component can be reached in a single instant, greatly accelerating the path’s reach. This may be quite appropriate if the phenomena being modeled is one where the rate of transmission is dramatically faster than the rate of network change (electricity distribution networks for example) but will mean that discovery times and reachable set may not match up with an epi-style transmission sim.
The boundry conditions are also important, they need to stop at the same timestep to give the same results.
Contrasting transmission trees image
How useful is the distribution of forward component sizes at predicting the distribution of epidemic sizes? There are potentially many ways to measure risk to a population from randomly seeded infections. Do we compare the sizes of the worst-case-scenario (maximum possible infection size)? An average of a sample of infections sizes? For now, we will compare the shapes of distributions of infection sizes in the three “concurrency comparison networks”.
The first distribution is for infections with an infect.prob of 1.0 (which is the same thing as the forward component size). We also show the distribution for inf.prob of 0.8, 0.5 and 0.2, (the last of which is perhaps in the plausible realm of human contagious diseases). For each of the three networks we choose a set of seed vertices at random. Multiple simulations are computed from each seed with varying infection probability. TODO: detail the number of runs of each case.
First we consider the ‘base’ network and compare the sampled distributions of infected component sizes for the four infectivity levels.
It appears that lowering the infectivity of the transmission process shifts the distribution of epidemic sizes more and more towards shorter, smaller epidemics when compared to the earliest fwd set sizes. Presumeably this is because completing the number of graph hops needed to create the larger infections become less and less likely as the infection probability is lowerd and the effects of network strucutre are diminished. Less likely to make it far enough to actually hit one of the bottlenecks that that opens up a big new region of the network.
We see similar trends for the two other networks with differing momentary degree distributions – allthough the ‘monog’ network the epidemic sizes are quite small to begin with
To examine the effect of the reduced infectivity on the potential transmission trees from individual vertices we can compare the mean value of the reachable set . pick the same seeds for the different infect probs on the same network so we can look at correlations of the reachable set size and infection size.
For each seed in the network, compute the mean infection size across the trials.
The scaling relationships seem to hold true across all three network condiations, just the overall size is dramatically reduced As inf.prob gets lower, the error bounds seem to get wider.
Although predicting the the distribution of infection sizes for low-probability transmission processes from the distribution of forward reachable set sizes may not yet be analytically tractable, we can definitly put some bounds on the expected sizes for individual seeds. In addition, we can certainly distinguish between the transmission potential of the three networks, something which might be quite difficult to do looking at their momentary characteristics. Of course, the networks examined are effectively ‘random’ within their parameter bounds. These expectations may not hold across other classes of networks (i.e. transit schedulaing)
TODO: can we simulate some networks with the same momentary degree distribution and edge durations where we have artifically reduced the transmission potential by increasing the local clustering (randomly distribute an attribute and add a nodematch term?). Also a network where we artificially increase the mixing (like a transit network) change partners in a deterministic round-robin?
How useful is the earliest fwd path at finding the set of vertices most likely to be reached by infections?
There are number of factors that determine why some vertices may have a greater potential to infect the network than others. Some possibilities include:
I expect that which of these effects are stronger may depend on properties of the infection process and the network
Are vertices with a low average tdist (when computed from a census or sample of other vertices) more likely to get hit? (or hit)
For the simulation runs done previously
## [1] 0.5265942
TODO: move the correlations into figure captions or text
cor(base1v1.0[,2],baseAggDeg)
## [1] 0.5265942
cor(base1v1.0[,2],100-baseEarly)
## [1] 0.7603958
cor(base1v1.0[,2],baseSoon)
## [1] 0.9007079
cor(base1v1.0[,2],baseTiedDur)
## [1] 0.4583467
Sooness is the most strongly correlated, followd by the earlyness of ties. Although, it is kind of silly to include sooness, since it effectively calculates the inf.comp size anyway so it would be strange if it wasn’t highly correlated. For this network, seems that becoming infectious early on may be more correlated with the size of the infection originating from a vertex than the total number of partners. Repeat for other network types.
Real-world datasets for sex contact networks are difficult to collect collected (’tho I think there is a dataset of sex-worker-client hookups from online dating service, Colorodo Springs). Can be modeled from ego nets. Can be used to efficiently estimate infection potential of various network models. Goodness-of-fit statistics for network models with explicit temporal components “Reality mining” contact network datasets increasingly common, may be feasible for things like influenza.
Bajardi P, Barrat A, Natale F, Savini L, Colizza V (2011) “Dynamical Patterns of Cattle Trade Movements”. PLoS ONE 6(5): e19869 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019869
Skye Bender-deMoll (2013). ndtv: Network Dynamic Temporal Visualizations. R package version 0.9 (2015). http://statnet.org http://CRAN.R-project.org/package=ndtv
Skye Bender-deMoll and Martina Morris (2012). tsna: Tools for Temporal Social Network Analysis. R package version 0.2. http://statnet.org http://CRAN.R-project.org/package=tsna
B. Bui Xuan, Afonso Ferreira, Aubin Jarry. Computing shortest, fastest, and foremost journeys in dynamic networks. RR-4589, 2002. https://hal.inria.fr/inria-00071996/document
Brandes, U. (2008). “On Variants of Shortest-Path Betweenness Centrality and their Generic Computation.” Social Networks, 30, 136–145. https://kops.uni-konstanz.de/bitstream/handle/123456789/5940/variants.pdf
Carter T. Butts, Ayn Leslie-Cook, Pavel N. Krivitsky and Skye Bender-deMoll (2015). networkDynamic: Dynamic Extensions for Network Objects. R package version 0.9. http://statnet.org http://CRAN.R-project.org/package=networkDynamic
K.L. Cooke, E. Halsey, “The shortest route through a network with time-dependent internodal transit times”, J. Math. Anal. Appl. (1966) 493.
C. Dube, C. Ribble, D. Kelton and B. McNab. (2008) “Comparing network analysis measures to determine potential epidemic size of highly contagious exotic diseases in fragmented monthly networks of dairy cattle movements in Ontario, Canada.” Transboundary and Emerging Diseases 55 (2008) 382–392 http://www.ncbi.nlm.nih.gov/pubmed/18840200
Chris Groendyke and David Welch (2015). epinet: Epidemic/Network-Related Tools. R package version 2.1.6. http://CRAN.R-project.org/package=epinet
Habiba, C. Tantipanananadh, and T. Y. Berger-Wolf. (2007) “Betweenness centrality in dynamic networks”. Technical Report 2007-19, DIMACS, 2007 http://compbio.cs.uic.edu/~habiba/HabibaTantipathananandhBerger-Wolf-BetweennessMeasure.pdf
Krivitsky P and Handcock M (2015). tergm: Fit, Simulate and Diagnose Models for Network Evolution Based on Exponential-Family Random Graph Models The Statnet Project ( http://www.statnet.org). R package version 3.4-14107.1-2015.10.18-17.37.57 http://CRAN.R-project.org/package=tergm
Moody, James. (2002) “The Importance of Relationship Timing for Diffusion.” Social Forces 81:25-56
Moody, James (2008) “Static Representations of Dynamic Networks” Duke Population Research Institute On-line Working Paper Series. http://papers.ccpr.ucla.edu/papers/PWP-DUKE-2009-009/PWP-DUKE-2009-009.pdf
Neal, P. and Roberts, G. (2004). Statistical inference and model selection for the 1861 Hagelloch measles epidemic. Biostatistics 5 (2), 249.
Samuel Jenness (2015) Modeling Epidemics over Observed Networks R workbook http://statnet.github.io/gal/empnet.html
Samuel Jenness, Steven M. Goodreau and Martina Morris (2015). EpiModel: Mathematical Modeling of Infectious Disease. R package version 1.2.2. http://epimodel.org/
Rajmonda Sulo, Tanya Berger-wolf, Robert Grossman (2010) “Meaningful selection of temporal resolution for dynamic networks” MLG ’10: Proceedings of the Eighth Workshop on Mining and Learning with Graphs p.127-136 http://compbio.cs.uic.edu/~chayant/papers/p127-sulo.pdf